Activation of cryptic biosynthetic gene clusters by fungal artificial chromosomes to produce novel secondary metabolites

In 2017, we reported the discovery of Berkeleylactone A (BPLA), a novel, potent antibiotic produced exclusively in co-culture by two extremophilic fungi, Penicillium fuscum and P. camembertii/clavigerum, which were isolated from the Berkeley Pit, an acid mine waste lake, in Butte, Montana. Neither fungus synthesized BPLA when grown in axenic culture. Recent studies suggest that secondary metabolites (SMs) are often synthesized by enzymes encoded by co-localized genes that form “biosynthetic gene clusters” (BGCs), which might remain silent (inactive) under various fermentation conditions. Fungi may also harbor cryptic BGCs that are not associated with previously characterized molecules. We turned to the tools of Fungal Artificial Chromosomes (FAC)-Next-Gen-Sequencing (NGS) to understand how co-culture activated cryptic biosynthesis of BPLA and several related berkeleylactones and to further investigate the true biosynthetic potential of these two fungi. FAC-NGS enables the capture of BGCs as individual FACs for heterologous expression in a modified strain of Aspergillus nidulans (heterologous host, FAC-AnHH). With this methodology, we created ten BGC-FACs that yielded fourteen different SMs, including strobilurin, which was previously isolated exclusively from basidiomycetes. Eleven of these compounds were not detected in the extracts of the FAC-AnHH. Of this discrete set, only the novel compound citreohybriddional had been isolated from either Penicillium sp. before and only at very low yield. We propose that through heterologous expression, FACs activated these silent BGCs, resulting in the synthesis of new natural products (NPs) with yields as high as 50%–60% of the crude organic extracts.


FAC Pooling and Illumina-index-sequencing of FAC pools
Individual FACs were grown in 15 mL tubes.For FAC libraries, each FAC clone of the first 10 plates of FAC libraries of P. fuscum (PW2A) and P. camemberti (PW2B) were duplicated into a 384deep-well plate with Terrific Broth (TB) Medium.TB medium: yeast extract, 24 g, tryptone, 20 g, dissolved in 900 mL; Phosphate buffer: 0.17 M KH2PO4, 0.72 M K2HPO4, 0.017 M KH2PO4, 0.072 M K2HPO4 in 100 mL autoclaved separately, then mixed together after cooling down to room temperature.At this point 8 mL of filter-sterilized 50% Glycerol, (with chloramphenicol to 12.5 ug/mL and arabinose to 0.01%) was added.The duplicated FAC plates were grown in a shaking incubator at 37 °C, 200 rpm for 24 h.Individual FAC-DNAs or FAC-DNA pools from individually grown FAC cells were pooled together using a common alkali-plasmid/BAC-DNA isolation method.Each FAC-DNA or pool was dissolved in 300 uL of 10mM TrisHCl (pH 8.0).In addition to the individual FAC-DNAs, twenty FAC plate-pools (A1-A10, B1-B10), and 40 sub-pools (16 row pools and 24 column pools) were created for each 384-well FAC plate.
Illumina TruSeq indexing library preparation of individual FAC-DNAs or FAC pool DNAs were prepared using Intact Genomics igNext NGS library kits (Intact Genomics).Briefly, FAC-DNA pools or individual FAC-DNAs were sheared to ~500 bp DNA fragments by sonication (sonicator Q800R, Qsonica) and normalized at 25 ng/uL.Approximately 100 ng of fragmented DNA then was endrepaired, dA-tailed, adaptor-ligated and subjected to bead-clean up between steps with Intact Genomics igNext NGS kits for the construction of initial Illumina libraries.Primers of single-indexes (24 indexes available) or dual-indexes (96 x 96 = 9,216 dual combination indexing available) can be used to generate final indexing Illumina libraries with minimal amplification cycles (12~15 cycles).Indexing Illumina libraries were qualified and quantified by both QuBit3 (Lifetech/Invetrogen) and Agilent 2100 Bioanalyzer.We initially used single indexing for 20 FAC pooled DNAs, dual indexing for individual FACs and late FAC-DNA pools.
We also tested and compared the NGS library kits of Illumina, New England Biolab, and Intact Genomics igNext and found that Intact Genomics igNext was equal to or better than the other NGS library kits.Moving forward, all Illumina indexing libraries were prepared with Intact Genomics igNext NGS kits.The initial single indexing step was the creation of Illumina libraries of 20 FAC-DNA pools.We had 2 Illumina Miseq runs with v3 chemistry (2 x 300 bp), and generated ~32Gb of sequencing data for individual FACs.We combined the dual-indexing Illumina libraries with other sequencing projects, balanced >1,300 dual-indexing Illumina libraries by an additional MiSeq run (v3, 2 x 75 bp), and generated enough coverage of all libraries (at least 100x) with a HiSeq Xten lane producing 360~400M reads per lane (110~120Gb data output per lane).
The raw MiSeq/HiSeq reads per indexed FAC or FAC pool were imported into an in-house platform based on the Unicycler package (1) for trimming, at a stringency of 0.01% error rate (equivalent to Q score of >40).The trimmed sequences of each indexed (barcoded) FAC or FAC pool were then binned and subjected to independent de novo assembly, leading to a single contig (~ 100 kb per FAC), or a set of contigs per FAC pool.When necessary, we incorporated a contemporary all-purpose short read assembler, such as SPAdes (3) to evaluate the performance for both individual FACs and for FAC pools.Assembly metrics based on QUAST and Meta QUAST2 (4) will be evaluated.An autoassembling, annotation and antiSMASH pipeline (5) was set up for the sequencing data analysis and prediction of the entire set of large SM biosynthetic gene clusters (BGCs) of P. fuscum (PW2A) and P. camemberti (PW2B), as well as BGCs in individual FACs.

Tables and figures
Table S1.The masses of the crude CHCl3 extracts of each transformant, and the purified masses of each natural product produced by the FAC transformants in fermentation experiments are shown, along with calculated % yields.*Both FACPKS-1L15-1 and FACPKS-2J5-1 produced asperugin A (11) and B (12), but these were detected in the crude extract by analysis of the LC/MS data and were not isolated, purified and quantified.TIC is a plot of the sum of all signal intensities of a single scan spectrum against time or scan number.Liquid chromatography/mass spectrometry (LC/MS) experiments were run on Agilent 6520 Q-TOF-LC/MS using a Phenomenex Gemini NX-C18 column.The LC was run in reverse phase gradient mode from 50% CH3CN/H2O with 0.1% formic acid to 100% CH3CN over 15 minutes, then held at 100% CH3CN for 4 minutes.All solvents used were spectral grade or distilled prior to use.Agilent Mass Hunter Work Station was used for data analysis.
Some of the FAC-Tr LC/MS (TICs) traces differ dramatically from that of the heterologous host (Figure S1A) while some are quite similar, and indicate the presence of overlapping SMs.Some of the most notable differences can be seen between the TIC of the FAC-AnHH (Figure S1A), and Figure S1B-D, which have large peaks associated with the SM asperlin (9) which has a [M+1] + of 213 amu.This peak is highlighted in Figure S1B-D, but not present in 1A. Figure S1E has a large peak at 13.8 minutes that correlated to strobilurin G (2) and F (3), which were not detected in the FACAnHH or any other FAC-Tr.In Figure S1F, peaks associated with sequoiamonascin (4) and penicillide (7) are clearly seen, but are not detectable in the FACAnHH or in other extracts.Several extracts have clear evidence of asperugin A (11) and B (12) in the crude extract, but again, these compounds are not discernible in the heterologous host.
Correlation spectroscopy (HMBC, COSY and HSQC) are powerful tools in elucidation.HSQC enabled the creation of the above chart correlating protons and carbons that were directly coupled to each other.Figure S3e shows proton-proton correlations in the COSY spectrum and proton-carbon correlations in the HMBC spectrum.The HMBC spectrum is optimized for 3-4 bond C-H correlations.S2).
Comparison of the 2bFAC4K13 ctr/adr BGC with other homologous BGCs of example fungal multi-ring meroterpenoids published or in NCBI database.Lines indicate the homology between the connected genes.No homologous BGC with known compound(s) in the Genbank database was found for the rest BGC-containing FACs.Interestingly the BGC of 2bFACPKS-9M19 has homologous protein sequences at 70%~90% identities to 6 genes associated with the 11-member macrophorin (mac) BGC of Penicillium terrestris LM254.

Figure S1 .
Figure S1.Analysis of crude extracts with LC/MS.These are the TIC (Total Ion Chromatogram) traces (A-L) of the crude CHCl3 extracts of the FAC-AnHH and FAC-Trs used in this study.

Figure S2d .
Figure S2d.HSQC of (1). 1 H-13 C Heteronuclear Single Bond Coherence spectrum allows determination of direct carbonconnectivity.The HSQC experiment is used to determine proton-carbon single bond correlations, where the protons lie along the observed F2 (X) axis and the carbons are along the F1 (Y) axis.

Figure S2e .
Figure S2e.HMBC spectrum of (1). 1 H-13 C Homonuclear Multiple Bond Coherence spectrum provides information about 2 -4 bond coupling between carbons and hydrogens.HMBC spectroscopy correlates 1 H and 13 C nuclei through two, three, or sometimes four bonds.Protons lie along the observed F2 (X) axis and the carbons are along the F1 (Y) axis.

Figure S3d. 1 H
Figure S3d.1 H-13 C HSQC (Heteronuclear single quantum coherence) spectrum of strobilurin G isolated from 2bFACPKS-5A24-3B.The HSQC experiment is used to determine proton-carbon single bond correlations, where the protons lie along the observed F2 (X) axis and the carbons are along the F1 (Y) axis.

Figure S4 .
Figure S4.Important correlations gleaned from the COSY spectrum (Figure S3b) shown in red and from the HMBC spectrum (Figure S3c) shown in blue facilitated the assemblage of Strobilurin G (2).

Figure S6 .
Figure S6.Sequoiamonascin D(10) and sequoiatones A(11) and F (12) were synthesized by FAC heterologous expression of FACPKS-10E3-2B.This PKS gene has 29% homologous protein sequence identity to the 2362MpPKS5 gene of the Monascus pilosus azaphilone pigment BGC cluster, which is associated with the synthesis of azaphilone type pigments, which includes rubropunctatin and rubropunctamine.Sequoiamonascin D and sequoiatones A and F were originally isolated from the endophyte Aspergillus parasiticus, which was harvested from the bark of Sequoia sempervirens.

Figure S7 .Figure S8 .
Figure S7.The similar structures of penicillide were discovered and elucidated by rapid FAC heterologous expression and related compounds: monodictyophenone, pestheic acid from publications.

Figure S10 .
Figure S10.The CHEF gels of E. coli-Aspergillus shuttle FAC libraries: PW2A (Penicillium fuscum) and PW2B (P.camembertii/clavigerum), average 110kb inserts were estimated based on >150 randomly picked FAC clones from both libraries.The middle lanes (M) are DNA Lambda ladder Markers, each of the rest of the lanes represents a random FAC clone that was completely digested with NotI.